21 research outputs found

    Disease Name Extraction from Clinical Text Using Conditional Random Fields

    Get PDF
    The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, including semantic types of words. We participated in the SemEval 2014 competition\u27s Task 7 and used its provided data to train and evaluate our system. Training data contained 199 clinical texts, development data contained 99 clinical texts, and the test data contained 133 clinical texts, these included discharge summaries, echocardiogram, radiology, and ECG reports. We obtained competitive results on the disease/disorder name extraction task. We found through ablation study that while all features contributed, MetaMap matches, POS tags, and previous and next words were the most effective features

    Unsupervised Biomedical Named Entity Recognition

    Get PDF
    Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain a list of biomedical entities and a large unannotated corpus to build an unsupervised NER system that does not require any manual annotations. The method that we developed in this research has two phases. In the first phase, a biomedical corpus is automatically annotated with some named entities using UMLS through unambiguous exact matching which we call weakly-labeled data. In this data, positive examples are the entities in the text that exactly match in UMLS and have only one semantic type which belongs to the desired entity class to be extracted (for example, diseases and disorders). Negative examples are the entities in the text that exactly match in UMLS but are of semantic types other than those that belong to the desired entity class. These examples are then used to train a machine learning classifier using features that represent the contexts in which they appeared in the text. The trained classifier is applied back to the text to gather more examples iteratively through the process of self-training. The trained classifier is then capable of classifying mentions in an unseen text as of the desired entity class or not from the contexts in which they appear. Although the trained named entity detector is good at detecting the presence of entities of the desired class in text, it cannot determine their correct boundaries. In the second phase of our method, called “Boundary Expansion”, the correct boundaries of the entities are determined. This method is based on a novel idea that utilizes machine learning and UMLS. Training examples for boundary expansion are gathered directly from UMLS and do not require any manual annotations. We also developed a new WordNet based approach for boundary expansion. Our developed method was evaluated on three datasets - SemEval 2014 Task 7 dataset that has diseases and disorders as the desired entity class, GENIA dataset that has proteins, DNAs, RNAs, cell types, and cell lines as the desired entity classes, and i2b2 dataset that has problems, tests, and treatments as the desired entity classes. Our method performed well and obtained performance close to supervised methods on the SemEval dataset. On the other datasets, it outperformed an existing unsupervised method on most entity classes. Availability of a list of entity names with their semantic types and a large unannotated corpus are the only requirements of our method to work well. Given these, our method generalizes across different types of entities and different types of biomedical text. Being unsupervised, the method can be easily applied to new NER tasks without needing costly annotations

    Learning for clinical named entity recognition without manual annotations

    Get PDF
    Background: Named entity recognition (NER) systems are commonly built using supervised methods that use machine learning to learn from corpora manually annotated with named entities. However, manually annotating corpora is very expensive and laborious. Materials and methods: In this paper, a novel method is presented for training clinical NER systems that does not require any manual annotations. It only requires a raw text corpus and a resource like UMLS that can give a list of named entities along with their semantic types. Using these two resources, annotations are automatically obtained to train machine learning methods. The method was evaluated on the NER shared-task datasets of i2b2 2010 and SemEval 2014. Results: On the SemEval 2014 dataset for recognizing diseases and disorders, the method obtained F-measure of 0.693 for exact matching and of 0.773 allowing overlaps. This is comparable to many supervised systems in the past that had used manual annotations for training. On the i2b2 2010 dataset for recognizing problems, tests and treatments, the method obtained F-measures of 0.451, 0.338 and 0.204 respectively for exact matching and of 0.721, 0.587 and 0.475 respectively allowing overlaps. These results are better than an existing unsupervised method. Conclusions: Experiments on standard datasets showed that the new method performed well. The method is general and could be applied to recognize entities of other types on other genres of text without needing manual annotations

    Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: In an era of shifting global agendas and expanded emphasis on non-communicable diseases and injuries along with communicable diseases, sound evidence on trends by cause at the national level is essential. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) provides a systematic scientific assessment of published, publicly available, and contributed data on incidence, prevalence, and mortality for a mutually exclusive and collectively exhaustive list of diseases and injuries. Methods: GBD estimates incidence, prevalence, mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) due to 369 diseases and injuries, for two sexes, and for 204 countries and territories. Input data were extracted from censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Cause-specific death rates and cause fractions were calculated using the Cause of Death Ensemble model and spatiotemporal Gaussian process regression. Cause-specific deaths were adjusted to match the total all-cause deaths calculated as part of the GBD population, fertility, and mortality estimates. Deaths were multiplied by standard life expectancy at each age to calculate YLLs. A Bayesian meta-regression modelling tool, DisMod-MR 2.1, was used to ensure consistency between incidence, prevalence, remission, excess mortality, and cause-specific mortality for most causes. Prevalence estimates were multiplied by disability weights for mutually exclusive sequelae of diseases and injuries to calculate YLDs. We considered results in the context of the Socio-demographic Index (SDI), a composite indicator of income per capita, years of schooling, and fertility rate in females younger than 25 years. Uncertainty intervals (UIs) were generated for every metric using the 25th and 975th ordered 1000 draw values of the posterior distribution. Findings: Global health has steadily improved over the past 30 years as measured by age-standardised DALY rates. After taking into account population growth and ageing, the absolute number of DALYs has remained stable. Since 2010, the pace of decline in global age-standardised DALY rates has accelerated in age groups younger than 50 years compared with the 1990–2010 time period, with the greatest annualised rate of decline occurring in the 0–9-year age group. Six infectious diseases were among the top ten causes of DALYs in children younger than 10 years in 2019: lower respiratory infections (ranked second), diarrhoeal diseases (third), malaria (fifth), meningitis (sixth), whooping cough (ninth), and sexually transmitted infections (which, in this age group, is fully accounted for by congenital syphilis; ranked tenth). In adolescents aged 10–24 years, three injury causes were among the top causes of DALYs: road injuries (ranked first), self-harm (third), and interpersonal violence (fifth). Five of the causes that were in the top ten for ages 10–24 years were also in the top ten in the 25–49-year age group: road injuries (ranked first), HIV/AIDS (second), low back pain (fourth), headache disorders (fifth), and depressive disorders (sixth). In 2019, ischaemic heart disease and stroke were the top-ranked causes of DALYs in both the 50–74-year and 75-years-and-older age groups. Since 1990, there has been a marked shift towards a greater proportion of burden due to YLDs from non-communicable diseases and injuries. In 2019, there were 11 countries where non-communicable disease and injury YLDs constituted more than half of all disease burden. Decreases in age-standardised DALY rates have accelerated over the past decade in countries at the lower end of the SDI range, while improvements have started to stagnate or even reverse in countries with higher SDI. Interpretation: As disability becomes an increasingly large component of disease burden and a larger component of health expenditure, greater research and developm nt investment is needed to identify new, more effective intervention strategies. With a rapidly ageing global population, the demands on health services to deal with disabling outcomes, which increase with age, will require policy makers to anticipate these changes. The mix of universal and more geographically specific influences on health reinforces the need for regular reporting on population health in detail and by underlying cause to help decision makers to identify success stories of disease control to emulate, as well as opportunities to improve. Funding: Bill & Melinda Gates Foundation. © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 licens

    Mapping local patterns of childhood overweight and wasting in low- and middle-income countries between 2000 and 2017

    Get PDF
    A double burden of malnutrition occurs when individuals, household members or communities experience both undernutrition and overweight. Here, we show geospatial estimates of overweight and wasting prevalence among children under 5 years of age in 105 low- and middle-income countries (LMICs) from 2000 to 2017 and aggregate these to policy-relevant administrative units. Wasting decreased overall across LMICs between 2000 and 2017, from 8.4% (62.3 (55.1–70.8) million) to 6.4% (58.3 (47.6–70.7) million), but is predicted to remain above the World Health Organization’s Global Nutrition Target of <5% in over half of LMICs by 2025. Prevalence of overweight increased from 5.2% (30 (22.8–38.5) million) in 2000 to 6.0% (55.5 (44.8–67.9) million) children aged under 5 years in 2017. Areas most affected by double burden of malnutrition were located in Indonesia, Thailand, southeastern China, Botswana, Cameroon and central Nigeria. Our estimates provide a new perspective to researchers, policy makers and public health agencies in their efforts to address this global childhood syndemic

    Reciprocal Alternations in Persian

    No full text
    An alternation is defined as a pair of sentences with more or less identical structures and the same meaning. These alternations are sensitive to the meaning component of verbs. Therefore, it can be used as a criterion for classifying verbs in an effective way. Levin classified English verbs into 49 broad semantic classes and 192 subclasses, introducing 79 argument alternations. She believes that various aspects of the syntactic behavior of verbs are tied to their meaning. Moreover, verbs that fall into classes according to shared behavior would be expected to show shared meaning components. The present study aimed to examine a type of alternations introduced by Levin called “Reciprocal Alternation”, which itself includes several types of alternations. One of these alternations, known as “Understood Reciprocal Object Alternation”, is a type of “Transitive Alternations”. Transitive alternations include alternations involving a change in a verb’s transitivity. The other types of Reciprocal Alternation are the alternations that occur without a change in a verb’s transitivity. These alternations include “Simple Reciprocal Alternation”, “Together Reciprocal Alternation”, and “Apart Reciprocal Alternation”. A corpus-based study of 3070 Persian verbs revealed that all these alternations are also found in Persian. Furthermore, two new types of these alternation named “Reciprocal Chaining Alternation” and “Reciprocal Collective Alternation” were introduced in the present study

    Learning for clinical named entity recognition without manual annotations

    Get PDF
    Background: Named entity recognition (NER) systems are commonly built using supervised methods that use machine learning to learn from corpora manually annotated with named entities. However, manually annotating corpora is very expensive and laborious. Materials and methods: In this paper, a novel method is presented for training clinical NER systems that does not require any manual annotations. It only requires a raw text corpus and a resource like UMLS that can give a list of named entities along with their semantic types. Using these two resources, annotations are automatically obtained to train machine learning methods. The method was evaluated on the NER shared-task datasets of i2b2 2010 and SemEval 2014. Results: On the SemEval 2014 dataset for recognizing diseases and disorders, the method obtained F-measure of 0.693 for exact matching and of 0.773 allowing overlaps. This is comparable to many supervised systems in the past that had used manual annotations for training. On the i2b2 2010 dataset for recognizing problems, tests and treatments, the method obtained F-measures of 0.451, 0.338 and 0.204 respectively for exact matching and of 0.721, 0.587 and 0.475 respectively allowing overlaps. These results are better than an existing unsupervised method. Conclusions: Experiments on standard datasets showed that the new method performed well. The method is general and could be applied to recognize entities of other types on other genres of text without needing manual annotations. Keywords: Named entity recognition, Clinical text, Machine learning, Natural language processing, Information extractio

    The relationship between different fatty acids intake and frequency of migraine attacks

    No full text
    Background: Migraine is a primary headache disorder that affects the neurovascular system. Recent studies have shown that consumption of some fatty acids such as omega-3 fatty acids improves migraine symptoms. The aim of the present study is to assess the association between usual intake of fatty acids such as eicosapentaenoic acid (EPA), docosahexaenoic acid (DHA), and saturated fatty acids (SFA) with the frequency of migraine attacks. Materials and Methods: 105 migraine patients with age ranging from 15 to 50 years participated in this cross-sectional study. Usual dietary consumption was assessed by using a semi-quantitative food frequency questionnaire (FFQ). Moreover, frequency of migraine attacks during 1 month period was determined in all participants. Data had been analyzed using independent sample t-test and linear regression test with adjustment of confounding variables. Results: In this study, we found that lower intake of EPA (β = −335.07, P = 0.006) and DHA (β = −142.51, P = 0.001) was associated with higher frequency of migraine attacks. In addition, we observed similar relationship either in men or women. No significant association was found between dietary intake of SFA and the frequency of migraine attacks (β = −0.032, P = 0.85). Conclusions: Frequency of migraine attacks was negatively associated with dietary intake of omega-3 polyunsaturated fatty acids. No significant relationship was found between SFA intake and migraine frequency. Further studies are required to shed light on our findings

    Assessment of pyridoxine and folate intake in migraine patients

    No full text
    Background: Migraine is a highly prevalent disorder worldwide. It affects 10–20% of the population during their lifetime. Recent studies have indicated that supplementation with folate and pyridoxine improves migraine symptoms. This study was undertaken to evaluate dietary intake of folate and pyridoxine in migraine patients and assessed their association with the frequency of migraine attacks. Materials and Methods: This is a case–control study performed on 124 migraine patients and 130 non-migraine subjects. Individuals' common dietary intake was determined by using a valid semi-quantitative 168-item food frequency questionnaire (FFQ). Data had been analyzed using independent t-test using SPSS software (version 18). Results: In this study, we found that migraine patients had lower intake of dietary folate compared with control group, but energy and pyridoxine intake were not different between the two groups. Further analysis among men and women revealed no statistically significant changes in these relationships. In addition, we found no significant association between dietary intake of pyridoxine and folate with the frequency of migraine attacks. Conclusion: Migraine patients had lower dietary intake of folate, compared with non-migraine group subjects. There was no significant association between folate and pyridoxine intake with the frequency of migraine attacks. Further studies are needed to confirm our findings
    corecore